Overview

Dataset statistics

Number of variables16
Number of observations1043910
Missing cells1105819
Missing cells (%)6.6%
Duplicate rows5325
Duplicate rows (%)0.5%
Total size in memory733.3 MiB
Average record size in memory736.6 B

Variable types

Categorical10
Numeric6

Alerts

Dataset has 5325 (0.5%) duplicate rowsDuplicates
CountyName has a high cardinality: 55 distinct values High cardinality
DateOfAssessment has a high cardinality: 5282 distinct values High cardinality
YearofConstruction is highly correlated with BerRating and 1 other fieldsHigh correlation
BerRating is highly correlated with YearofConstruction and 1 other fieldsHigh correlation
GroundFloorArea(sq m) is highly correlated with TotalDeliveredEnergyHigh correlation
CO2Rating is highly correlated with YearofConstruction and 2 other fieldsHigh correlation
TotalDeliveredEnergy is highly correlated with GroundFloorArea(sq m) and 1 other fieldsHigh correlation
YearofConstruction is highly correlated with BerRatingHigh correlation
BerRating is highly correlated with YearofConstruction and 2 other fieldsHigh correlation
CO2Rating is highly correlated with BerRating and 1 other fieldsHigh correlation
TotalDeliveredEnergy is highly correlated with BerRating and 1 other fieldsHigh correlation
YearofConstruction is highly correlated with BerRating and 1 other fieldsHigh correlation
BerRating is highly correlated with YearofConstruction and 1 other fieldsHigh correlation
CO2Rating is highly correlated with YearofConstruction and 1 other fieldsHigh correlation
MainSpaceHeatingFuel is highly correlated with MainWaterHeatingFuelHigh correlation
MainWaterHeatingFuel is highly correlated with MainSpaceHeatingFuelHigh correlation
CountyName is highly correlated with DwellingTypeDescr and 2 other fieldsHigh correlation
DwellingTypeDescr is highly correlated with CountyNameHigh correlation
YearofConstruction is highly correlated with EnergyRatingHigh correlation
EnergyRating is highly correlated with YearofConstruction and 3 other fieldsHigh correlation
BerRating is highly correlated with CO2Rating and 1 other fieldsHigh correlation
CO2Rating is highly correlated with BerRating and 1 other fieldsHigh correlation
MainSpaceHeatingFuel is highly correlated with CountyName and 1 other fieldsHigh correlation
MainWaterHeatingFuel is highly correlated with CountyName and 1 other fieldsHigh correlation
VentilationMethod is highly correlated with EnergyRatingHigh correlation
StructureType is highly correlated with EnergyRatingHigh correlation
InsulationType is highly correlated with EnergyRatingHigh correlation
TotalDeliveredEnergy is highly correlated with BerRating and 1 other fieldsHigh correlation
MainSpaceHeatingFuel has 15648 (1.5%) missing values Missing
MainWaterHeatingFuel has 15648 (1.5%) missing values Missing
InsulationType has 232787 (22.3%) missing values Missing
InsulationThickness has 232787 (22.3%) missing values Missing
TotalDeliveredEnergy has 598110 (57.3%) missing values Missing
BerRating is highly skewed (γ1 = 51.73654348) Skewed
CO2Rating is highly skewed (γ1 = 68.08527183) Skewed
TotalDeliveredEnergy is highly skewed (γ1 = 87.22682973) Skewed
InsulationThickness has 128896 (12.3%) zeros Zeros

Reproduction

Analysis started2022-07-20 22:28:21.343380
Analysis finished2022-07-20 22:29:08.164647
Duration46.82 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

CountyName
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct55
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size66.6 MiB
Co. Cork
94961 
Co. Dublin
80411 
Co. Kildare
 
47843
Co. Meath
 
41047
Co. Galway
 
40009
Other values (50)
739639 

Length

Max length14
Median length13
Mean length9.867811401
Min length8

Characters and Unicode

Total characters10301107
Distinct characters46
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCo. Donegal
2nd rowCo. Kildare
3rd rowCo. Dublin
4th rowDublin 11
5th rowDublin 22

Common Values

ValueCountFrequency (%)
Co. Cork94961
 
9.1%
Co. Dublin80411
 
7.7%
Co. Kildare47843
 
4.6%
Co. Meath41047
 
3.9%
Co. Galway40009
 
3.8%
Co. Wexford34283
 
3.3%
Co. Kerry32755
 
3.1%
Co. Tipperary32565
 
3.1%
Co. Wicklow31870
 
3.1%
Co. Donegal31717
 
3.0%
Other values (45)576449
55.2%

Length

2022-07-20T23:29:08.194479image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
co741717
35.5%
dublin313810
15.0%
cork117311
 
5.6%
city68794
 
3.3%
galway56026
 
2.7%
kildare47843
 
2.3%
limerick44677
 
2.1%
meath41047
 
2.0%
wexford34283
 
1.6%
kerry32755
 
1.6%
Other values (43)589557
28.2%

Most occurring characters

ValueCountFrequency (%)
o1152923
 
11.2%
1043910
 
10.1%
C982411
 
9.5%
.741717
 
7.2%
i648791
 
6.3%
l567063
 
5.5%
r485924
 
4.7%
a458380
 
4.4%
n436114
 
4.2%
e379830
 
3.7%
Other values (36)3404044
33.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter6289600
61.1%
Uppercase Letter1860018
 
18.1%
Space Separator1043910
 
10.1%
Other Punctuation741717
 
7.2%
Decimal Number365862
 
3.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o1152923
18.3%
i648791
10.3%
l567063
9.0%
r485924
 
7.7%
a458380
 
7.3%
n436114
 
6.9%
e379830
 
6.0%
u343657
 
5.5%
b313810
 
5.0%
y248387
 
3.9%
Other values (13)1254721
19.9%
Uppercase Letter
ValueCountFrequency (%)
C982411
52.8%
D345527
 
18.6%
W117704
 
6.3%
L107620
 
5.8%
K97447
 
5.2%
M78586
 
4.2%
G56026
 
3.0%
T32565
 
1.8%
S15811
 
0.9%
O13836
 
0.7%
Decimal Number
ValueCountFrequency (%)
1125110
34.2%
255729
15.2%
440890
 
11.2%
534468
 
9.4%
828666
 
7.8%
625277
 
6.9%
318567
 
5.1%
717704
 
4.8%
912512
 
3.4%
06939
 
1.9%
Space Separator
ValueCountFrequency (%)
1043910
100.0%
Other Punctuation
ValueCountFrequency (%)
.741717
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8149618
79.1%
Common2151489
 
20.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
o1152923
14.1%
C982411
12.1%
i648791
 
8.0%
l567063
 
7.0%
r485924
 
6.0%
a458380
 
5.6%
n436114
 
5.4%
e379830
 
4.7%
D345527
 
4.2%
u343657
 
4.2%
Other values (24)2348998
28.8%
Common
ValueCountFrequency (%)
1043910
48.5%
.741717
34.5%
1125110
 
5.8%
255729
 
2.6%
440890
 
1.9%
534468
 
1.6%
828666
 
1.3%
625277
 
1.2%
318567
 
0.9%
717704
 
0.8%
Other values (2)19451
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII10301107
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o1152923
 
11.2%
1043910
 
10.1%
C982411
 
9.5%
.741717
 
7.2%
i648791
 
6.3%
l567063
 
5.5%
r485924
 
4.7%
a458380
 
4.4%
n436114
 
4.2%
e379830
 
3.7%
Other values (36)3404044
33.0%

DwellingTypeDescr
Categorical

HIGH CORRELATION

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size73.6 MiB
Detached house
293509 
Semi-detached house
286024 
Mid-terrace house
148536 
End of terrace house
80130 
Mid-floor apartment
71142 
Other values (6)
164569 

Length

Max length22
Median length20
Mean length16.95907023
Min length5

Characters and Unicode

Total characters17703743
Distinct characters29
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDetached house
2nd rowDetached house
3rd rowSemi-detached house
4th rowSemi-detached house
5th rowSemi-detached house

Common Values

ValueCountFrequency (%)
Detached house293509
28.1%
Semi-detached house286024
27.4%
Mid-terrace house148536
14.2%
End of terrace house80130
 
7.7%
Mid-floor apartment71142
 
6.8%
Top-floor apartment58379
 
5.6%
Ground-floor apartment56706
 
5.4%
House33950
 
3.3%
Maisonette11822
 
1.1%
Apartment3382
 
0.3%

Length

2022-07-20T23:29:08.242310image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
house842149
38.3%
detached293509
 
13.3%
semi-detached286024
 
13.0%
apartment189609
 
8.6%
mid-terrace148536
 
6.8%
end80130
 
3.6%
of80130
 
3.6%
terrace80130
 
3.6%
mid-floor71142
 
3.2%
top-floor58379
 
2.7%
Other values (4)69188
 
3.1%

Most occurring characters

ValueCountFrequency (%)
e2958814
16.7%
o1421640
 
8.0%
h1387732
 
7.8%
d1222071
 
6.9%
t1211391
 
6.8%
a1196187
 
6.8%
1155016
 
6.5%
u898855
 
5.1%
r889874
 
5.0%
s854301
 
4.8%
Other values (19)4507862
25.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter14883700
84.1%
Space Separator1155016
 
6.5%
Uppercase Letter1044240
 
5.9%
Dash Punctuation620787
 
3.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e2958814
19.9%
o1421640
9.6%
h1387732
9.3%
d1222071
8.2%
t1211391
8.1%
a1196187
8.0%
u898855
 
6.0%
r889874
 
6.0%
s854301
 
5.7%
c808199
 
5.4%
Other values (8)2034636
13.7%
Uppercase Letter
ValueCountFrequency (%)
D293839
28.1%
S286024
27.4%
M231500
22.2%
E80130
 
7.7%
T58379
 
5.6%
G56706
 
5.4%
H33950
 
3.3%
A3382
 
0.3%
B330
 
< 0.1%
Space Separator
ValueCountFrequency (%)
1155016
100.0%
Dash Punctuation
ValueCountFrequency (%)
-620787
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin15927940
90.0%
Common1775803
 
10.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e2958814
18.6%
o1421640
8.9%
h1387732
8.7%
d1222071
 
7.7%
t1211391
 
7.6%
a1196187
 
7.5%
u898855
 
5.6%
r889874
 
5.6%
s854301
 
5.4%
c808199
 
5.1%
Other values (17)3078876
19.3%
Common
ValueCountFrequency (%)
1155016
65.0%
-620787
35.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII17703743
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e2958814
16.7%
o1421640
 
8.0%
h1387732
 
7.8%
d1222071
 
6.9%
t1211391
 
6.8%
a1196187
 
6.8%
1155016
 
6.5%
u898855
 
5.1%
r889874
 
5.0%
s854301
 
4.8%
Other values (19)4507862
25.5%

YearofConstruction
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct267
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1983.789112
Minimum1753
Maximum2104
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.0 MiB
2022-07-20T23:29:08.294225image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1753
5-th percentile1900
Q11973
median1997
Q32005
95-th percentile2019
Maximum2104
Range351
Interquartile range (IQR)32

Descriptive statistics

Standard deviation33.78415428
Coefficient of variation (CV)0.01703011377
Kurtosis3.932625529
Mean1983.789112
Median Absolute Deviation (MAD)12
Skewness-1.815695191
Sum2070897292
Variance1141.36908
MonotonicityNot monotonic
2022-07-20T23:29:08.344832image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
200651793
 
5.0%
200448399
 
4.6%
200547419
 
4.5%
200740429
 
3.9%
200339650
 
3.8%
200233077
 
3.2%
190031000
 
3.0%
200029573
 
2.8%
200125570
 
2.4%
199824319
 
2.3%
Other values (257)672681
64.4%
ValueCountFrequency (%)
175314
 
< 0.1%
17571
 
< 0.1%
17593
 
< 0.1%
1760224
< 0.1%
17618
 
< 0.1%
17621
 
< 0.1%
17641
 
< 0.1%
17655
 
< 0.1%
17661
 
< 0.1%
17672
 
< 0.1%
ValueCountFrequency (%)
21041
 
< 0.1%
20731
 
< 0.1%
20331
 
< 0.1%
20291
 
< 0.1%
20241
 
< 0.1%
20231
 
< 0.1%
20226643
 
0.6%
202113620
1.3%
202017480
1.7%
201922155
2.1%

EnergyRating
Categorical

HIGH CORRELATION

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size58.7 MiB
C2
126621 
C3
119711 
C1
116607 
D1
115760 
D2
99524 
Other values (10)
465687 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters2087820
Distinct characters11
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowC2
2nd rowB3
3rd rowC3
4th rowC2
5th rowD2

Common Values

ValueCountFrequency (%)
C2126621
12.1%
C3119711
11.5%
C1116607
11.2%
D1115760
11.1%
D299524
9.5%
B381540
7.8%
G 67179
6.4%
A359647
5.7%
E157450
 
5.5%
A253402
 
5.1%
Other values (5)146469
14.0%

Length

2022-07-20T23:29:08.393288image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c2126621
12.1%
c3119711
11.5%
c1116607
11.2%
d1115760
11.1%
d299524
9.5%
b381540
7.8%
g67179
6.4%
a359647
5.7%
e157450
 
5.5%
a253402
 
5.1%
Other values (5)146469
14.0%

Most occurring characters

ValueCountFrequency (%)
C362939
17.4%
2360989
17.3%
1308141
14.8%
3260898
12.5%
D215284
10.3%
B134495
 
6.4%
A114428
 
5.5%
113882
 
5.5%
E102882
 
4.9%
G67179
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1043910
50.0%
Decimal Number930028
44.5%
Space Separator113882
 
5.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C362939
34.8%
D215284
20.6%
B134495
 
12.9%
A114428
 
11.0%
E102882
 
9.9%
G67179
 
6.4%
F46703
 
4.5%
Decimal Number
ValueCountFrequency (%)
2360989
38.8%
1308141
33.1%
3260898
28.1%
Space Separator
ValueCountFrequency (%)
113882
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1043910
50.0%
Common1043910
50.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
C362939
34.8%
D215284
20.6%
B134495
 
12.9%
A114428
 
11.0%
E102882
 
9.9%
G67179
 
6.4%
F46703
 
4.5%
Common
ValueCountFrequency (%)
2360989
34.6%
1308141
29.5%
3260898
25.0%
113882
 
10.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII2087820
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C362939
17.4%
2360989
17.3%
1308141
14.8%
3260898
12.5%
D215284
10.3%
B134495
 
6.4%
A114428
 
5.5%
113882
 
5.5%
E102882
 
4.9%
G67179
 
3.2%

BerRating
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct77722
Distinct (%)7.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean234.5496591
Minimum-158.42
Maximum56423.71
Zeros0
Zeros (%)0.0%
Negative161
Negative (%)< 0.1%
Memory size8.0 MiB
2022-07-20T23:29:08.437764image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum-158.42
5-th percentile49.31
Q1153.64
median207.08
Q3282.32
95-th percentile492.23
Maximum56423.71
Range56582.13
Interquartile range (IQR)128.68

Descriptive statistics

Standard deviation172.7810331
Coefficient of variation (CV)0.7366501138
Kurtosis13357.81442
Mean234.5496591
Median Absolute Deviation (MAD)61.51
Skewness51.73654348
Sum244848734.6
Variance29853.28538
MonotonicityNot monotonic
2022-07-20T23:29:08.485776image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
224.56105
 
< 0.1%
181.9796
 
< 0.1%
224.8793
 
< 0.1%
224.8593
 
< 0.1%
224.9591
 
< 0.1%
55.9891
 
< 0.1%
224.6390
 
< 0.1%
174.8189
 
< 0.1%
199.9289
 
< 0.1%
212.5288
 
< 0.1%
Other values (77712)1042985
99.9%
ValueCountFrequency (%)
-158.421
< 0.1%
-97.371
< 0.1%
-63.961
< 0.1%
-60.971
< 0.1%
-56.061
< 0.1%
-49.161
< 0.1%
-48.011
< 0.1%
-45.321
< 0.1%
-44.661
< 0.1%
-43.641
< 0.1%
ValueCountFrequency (%)
56423.711
< 0.1%
32134.941
< 0.1%
31623.331
< 0.1%
21725.621
< 0.1%
18771.311
< 0.1%
13914.781
< 0.1%
11823.781
< 0.1%
11476.291
< 0.1%
9892.941
< 0.1%
9183.171
< 0.1%

GroundFloorArea(sq m)
Real number (ℝ≥0)

HIGH CORRELATION

Distinct43116
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean114.0114127
Minimum5.47
Maximum3546.11
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.0 MiB
2022-07-20T23:29:08.538639image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum5.47
5-th percentile47.71
Q177.92
median100.3
Q3133.71
95-th percentile229.43
Maximum3546.11
Range3540.64
Interquartile range (IQR)55.79

Descriptive statistics

Standard deviation59.76115294
Coefficient of variation (CV)0.5241681647
Kurtosis36.34048694
Mean114.0114127
Median Absolute Deviation (MAD)26.12
Skewness2.837078414
Sum119017653.8
Variance3571.395401
MonotonicityNot monotonic
2022-07-20T23:29:08.686167image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
811374
 
0.1%
801191
 
0.1%
901050
 
0.1%
841022
 
0.1%
82995
 
0.1%
85846
 
0.1%
78840
 
0.1%
88815
 
0.1%
70807
 
0.1%
108754
 
0.1%
Other values (43106)1034216
99.1%
ValueCountFrequency (%)
5.471
< 0.1%
6.71
< 0.1%
7.211
< 0.1%
7.261
< 0.1%
7.471
< 0.1%
7.71
< 0.1%
7.911
< 0.1%
7.961
< 0.1%
8.31
< 0.1%
8.311
< 0.1%
ValueCountFrequency (%)
3546.111
< 0.1%
3229.391
< 0.1%
2331.921
< 0.1%
2011.251
< 0.1%
1825.991
< 0.1%
1788.61
< 0.1%
1705.021
< 0.1%
1625.341
< 0.1%
15931
< 0.1%
1572.511
< 0.1%

CO2Rating
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct29109
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean54.55772451
Minimum-258.25
Maximum18417.1
Zeros1
Zeros (%)< 0.1%
Negative210
Negative (%)< 0.1%
Memory size8.0 MiB
2022-07-20T23:29:08.739759image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum-258.25
5-th percentile9.43
Q132.46
median46.25
Q365.13
95-th percentile123.22
Maximum18417.1
Range18675.35
Interquartile range (IQR)32.67

Descriptive statistics

Standard deviation48.59638706
Coefficient of variation (CV)0.8907333929
Kurtosis21972.23116
Mean54.55772451
Median Absolute Deviation (MAD)15.71
Skewness68.08527183
Sum56953354.19
Variance2361.608835
MonotonicityNot monotonic
2022-07-20T23:29:08.789174image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9.69282
 
< 0.1%
10.68281
 
< 0.1%
9.43277
 
< 0.1%
9.56274
 
< 0.1%
10.1273
 
< 0.1%
10.37269
 
< 0.1%
10.21263
 
< 0.1%
9.42262
 
< 0.1%
9.64257
 
< 0.1%
9.35257
 
< 0.1%
Other values (29099)1041215
99.7%
ValueCountFrequency (%)
-258.251
< 0.1%
-88.571
< 0.1%
-27.981
< 0.1%
-23.821
< 0.1%
-20.251
< 0.1%
-17.511
< 0.1%
-16.021
< 0.1%
-14.491
< 0.1%
-13.671
< 0.1%
-13.091
< 0.1%
ValueCountFrequency (%)
18417.11
< 0.1%
105411
< 0.1%
5840.251
< 0.1%
4019.21
< 0.1%
3467.311
< 0.1%
3327.241
< 0.1%
3283.771
< 0.1%
2822.291
< 0.1%
2817.231
< 0.1%
2760.171
< 0.1%

MainSpaceHeatingFuel
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct20
Distinct (%)< 0.1%
Missing15648
Missing (%)1.5%
Memory size85.8 MiB
Mains Gas
399479 
Heating Oil
367174 
Electricity
200464 
Solid Multi-Fuel
 
31706
Bulk LPG (propane or butane)
 
14346
Other values (15)
 
15093

Length

Max length30
Median length30
Mean length30
Min length30

Characters and Unicode

Total characters30847860
Distinct characters42
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowHeating Oil
2nd rowHeating Oil
3rd rowMains Gas
4th rowMains Gas
5th rowMains Gas

Common Values

ValueCountFrequency (%)
Mains Gas 399479
38.3%
Heating Oil 367174
35.2%
Electricity 200464
19.2%
Solid Multi-Fuel 31706
 
3.0%
Bulk LPG (propane or butane) 14346
 
1.4%
Manufactured Smokeless Fuel 6714
 
0.6%
House Coal 3180
 
0.3%
Wood Pellets (bulk supply for 1480
 
0.1%
Sod Peat 1240
 
0.1%
Bottled LPG 1109
 
0.1%
Other values (10)1370
 
0.1%
(Missing)15648
 
1.5%

Length

2022-07-20T23:29:08.836159image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mains399479
20.9%
gas399479
20.9%
heating367174
19.2%
oil367174
19.2%
electricity200560
10.5%
solid31706
 
1.7%
multi-fuel31706
 
1.7%
bulk15826
 
0.8%
lpg15455
 
0.8%
butane14346
 
0.8%
Other values (34)68321
 
3.6%

Most occurring characters

ValueCountFrequency (%)
20683808
67.1%
i1599064
 
5.2%
a1213361
 
3.9%
t827289
 
2.7%
s820201
 
2.7%
n802377
 
2.6%
l701261
 
2.3%
e665106
 
2.2%
M437899
 
1.4%
G414934
 
1.3%
Other values (32)2682560
 
8.7%

Most occurring categories

ValueCountFrequency (%)
Space Separator20683808
67.1%
Lowercase Letter8176284
 
26.5%
Uppercase Letter1925476
 
6.2%
Dash Punctuation31912
 
0.1%
Open Punctuation16034
 
0.1%
Close Punctuation14346
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i1599064
19.6%
a1213361
14.8%
t827289
10.1%
s820201
10.0%
n802377
9.8%
l701261
8.6%
e665106
8.1%
c408106
 
5.0%
g368106
 
4.5%
r238045
 
2.9%
Other values (12)533368
 
6.5%
Uppercase Letter
ValueCountFrequency (%)
M437899
22.7%
G414934
21.5%
H370354
19.2%
O367229
19.1%
E200560
10.4%
S39701
 
2.1%
F38420
 
2.0%
P18658
 
1.0%
L16124
 
0.8%
B15735
 
0.8%
Other values (6)5862
 
0.3%
Space Separator
ValueCountFrequency (%)
20683808
100.0%
Dash Punctuation
ValueCountFrequency (%)
-31912
100.0%
Open Punctuation
ValueCountFrequency (%)
(16034
100.0%
Close Punctuation
ValueCountFrequency (%)
)14346
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common20746100
67.3%
Latin10101760
32.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
i1599064
15.8%
a1213361
12.0%
t827289
 
8.2%
s820201
 
8.1%
n802377
 
7.9%
l701261
 
6.9%
e665106
 
6.6%
M437899
 
4.3%
G414934
 
4.1%
c408106
 
4.0%
Other values (28)2212162
21.9%
Common
ValueCountFrequency (%)
20683808
99.7%
-31912
 
0.2%
(16034
 
0.1%
)14346
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII30847860
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
20683808
67.1%
i1599064
 
5.2%
a1213361
 
3.9%
t827289
 
2.7%
s820201
 
2.7%
n802377
 
2.6%
l701261
 
2.3%
e665106
 
2.2%
M437899
 
1.4%
G414934
 
1.3%
Other values (32)2682560
 
8.7%

MainWaterHeatingFuel
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct21
Distinct (%)< 0.1%
Missing15648
Missing (%)1.5%
Memory size85.8 MiB
Mains Gas
397437 
Heating Oil
364085 
Electricity
208906 
Solid Multi-Fuel
 
29558
Bulk LPG (propane or butane)
 
14286
Other values (16)
 
13990

Length

Max length30
Median length30
Mean length30
Min length30

Characters and Unicode

Total characters30847860
Distinct characters42
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowHeating Oil
2nd rowHeating Oil
3rd rowMains Gas
4th rowMains Gas
5th rowMains Gas

Common Values

ValueCountFrequency (%)
Mains Gas 397437
38.1%
Heating Oil 364085
34.9%
Electricity 208906
20.0%
Solid Multi-Fuel 29558
 
2.8%
Bulk LPG (propane or butane) 14286
 
1.4%
Manufactured Smokeless Fuel 5700
 
0.5%
House Coal 3022
 
0.3%
Wood Pellets (bulk supply for 1429
 
0.1%
Sod Peat 1262
 
0.1%
Bottled LPG 1216
 
0.1%
Other values (11)1361
 
0.1%
(Missing)15648
 
1.5%

Length

2022-07-20T23:29:08.873963image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mains397437
20.9%
gas397437
20.9%
heating364085
19.1%
oil364085
19.1%
electricity209003
11.0%
solid29558
 
1.6%
multi-fuel29558
 
1.6%
bulk15715
 
0.8%
lpg15502
 
0.8%
butane14286
 
0.8%
Other values (35)64822
 
3.4%

Most occurring characters

ValueCountFrequency (%)
20698285
67.1%
i1603491
 
5.2%
a1203962
 
3.9%
t838113
 
2.7%
s813842
 
2.6%
n796173
 
2.6%
l697872
 
2.3%
e664097
 
2.2%
M432695
 
1.4%
c424015
 
1.4%
Other values (32)2675315
 
8.7%

Most occurring categories

ValueCountFrequency (%)
Space Separator20698285
67.1%
Lowercase Letter8175690
 
26.5%
Uppercase Letter1913924
 
6.2%
Dash Punctuation29739
 
0.1%
Open Punctuation15936
 
0.1%
Close Punctuation14286
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i1603491
19.6%
a1203962
14.7%
t838113
10.3%
s813842
10.0%
n796173
9.7%
l697872
8.5%
e664097
8.1%
c424015
 
5.2%
g364954
 
4.5%
r245374
 
3.0%
Other values (12)523797
 
6.4%
Uppercase Letter
ValueCountFrequency (%)
M432695
22.6%
G412939
21.6%
H367107
19.2%
O364127
19.0%
E209003
10.9%
S36575
 
1.9%
F35258
 
1.8%
P18686
 
1.0%
L16108
 
0.8%
B15788
 
0.8%
Other values (6)5638
 
0.3%
Space Separator
ValueCountFrequency (%)
20698285
100.0%
Dash Punctuation
ValueCountFrequency (%)
-29739
100.0%
Open Punctuation
ValueCountFrequency (%)
(15936
100.0%
Close Punctuation
ValueCountFrequency (%)
)14286
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common20758246
67.3%
Latin10089614
32.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
i1603491
15.9%
a1203962
11.9%
t838113
 
8.3%
s813842
 
8.1%
n796173
 
7.9%
l697872
 
6.9%
e664097
 
6.6%
M432695
 
4.3%
c424015
 
4.2%
G412939
 
4.1%
Other values (28)2202415
21.8%
Common
ValueCountFrequency (%)
20698285
99.7%
-29739
 
0.1%
(15936
 
0.1%
)14286
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII30847860
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
20698285
67.1%
i1603491
 
5.2%
a1203962
 
3.9%
t838113
 
2.7%
s813842
 
2.6%
n796173
 
2.6%
l697872
 
2.3%
e664097
 
2.2%
M432695
 
1.4%
c424015
 
1.4%
Other values (32)2675315
 
8.7%

VentilationMethod
Categorical

HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing3613
Missing (%)0.3%
Memory size70.5 MiB
Natural vent.
967791 
Whole house extract vent.
 
36357
Bal.whole mech.vent heat recvr
 
33861
Pos input vent.- loft
 
1406
Bal.whole mech.vent no heat re
 
445

Length

Max length30
Median length13
Mean length13.99542823
Min length13

Characters and Unicode

Total characters14559402
Distinct characters26
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNatural vent.
2nd rowNatural vent.
3rd rowNatural vent.
4th rowNatural vent.
5th rowNatural vent.

Common Values

ValueCountFrequency (%)
Natural vent.967791
92.7%
Whole house extract vent.36357
 
3.5%
Bal.whole mech.vent heat recvr33861
 
3.2%
Pos input vent.- loft1406
 
0.1%
Bal.whole mech.vent no heat re445
 
< 0.1%
Pos input vent.- outside437
 
< 0.1%
(Missing)3613
 
0.3%

Length

2022-07-20T23:29:08.914486image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-20T23:29:08.967530image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
vent1005991
45.2%
natural967791
43.5%
whole36357
 
1.6%
house36357
 
1.6%
extract36357
 
1.6%
bal.whole34306
 
1.5%
mech.vent34306
 
1.5%
heat34306
 
1.5%
recvr33861
 
1.5%
pos1843
 
0.1%
Other values (5)4576
 
0.2%

Most occurring characters

ValueCountFrequency (%)
t2118794
14.6%
a2040551
14.0%
e1287029
8.8%
1185754
8.1%
.1074603
7.4%
l1074166
7.4%
v1074158
7.4%
r1072315
7.4%
n1042585
7.2%
u1006428
6.9%
Other values (16)1583019
10.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter11256905
77.3%
Space Separator1185754
 
8.1%
Other Punctuation1074603
 
7.4%
Uppercase Letter1040297
 
7.1%
Dash Punctuation1843
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t2118794
18.8%
a2040551
18.1%
e1287029
11.4%
l1074166
9.5%
v1074158
9.5%
r1072315
9.5%
n1042585
9.3%
u1006428
8.9%
h175632
 
1.6%
o111151
 
1.0%
Other values (9)254096
 
2.3%
Uppercase Letter
ValueCountFrequency (%)
N967791
93.0%
W36357
 
3.5%
B34306
 
3.3%
P1843
 
0.2%
Space Separator
ValueCountFrequency (%)
1185754
100.0%
Other Punctuation
ValueCountFrequency (%)
.1074603
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1843
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin12297202
84.5%
Common2262200
 
15.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
t2118794
17.2%
a2040551
16.6%
e1287029
10.5%
l1074166
8.7%
v1074158
8.7%
r1072315
8.7%
n1042585
8.5%
u1006428
8.2%
N967791
7.9%
h175632
 
1.4%
Other values (13)437753
 
3.6%
Common
ValueCountFrequency (%)
1185754
52.4%
.1074603
47.5%
-1843
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII14559402
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t2118794
14.6%
a2040551
14.0%
e1287029
8.8%
1185754
8.1%
.1074603
7.4%
l1074166
7.4%
v1074158
7.4%
r1072315
7.4%
n1042585
7.2%
u1006428
6.9%
Other values (16)1583019
10.9%

StructureType
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing3613
Missing (%)0.3%
Memory size86.4 MiB
Masonry
888076 
Please select
 
82498
Timber or Steel Frame
 
62959
Insulated Conctete Form
 
6764

Length

Max length30
Median length30
Mean length30
Min length30

Characters and Unicode

Total characters31208910
Distinct characters23
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMasonry
2nd rowMasonry
3rd rowMasonry
4th rowMasonry
5th rowMasonry

Common Values

ValueCountFrequency (%)
Masonry 888076
85.1%
Please select 82498
 
7.9%
Timber or Steel Frame 62959
 
6.0%
Insulated Conctete Form 6764
 
0.6%
(Missing)3613
 
0.3%

Length

2022-07-20T23:29:09.010741image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-20T23:29:09.052024image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
masonry888076
67.0%
please82498
 
6.2%
select82498
 
6.2%
timber62959
 
4.8%
or62959
 
4.8%
steel62959
 
4.8%
frame62959
 
4.8%
insulated6764
 
0.5%
conctete6764
 
0.5%
form6764
 
0.5%

Most occurring characters

ValueCountFrequency (%)
22727096
72.8%
r1083717
 
3.5%
s1059836
 
3.4%
a1040297
 
3.3%
o964563
 
3.1%
n901604
 
2.9%
M888076
 
2.8%
y888076
 
2.8%
e602120
 
1.9%
l234719
 
0.8%
Other values (13)818806
 
2.6%

Most occurring categories

ValueCountFrequency (%)
Space Separator22727096
72.8%
Lowercase Letter7302071
 
23.4%
Uppercase Letter1179743
 
3.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r1083717
14.8%
s1059836
14.5%
a1040297
14.2%
o964563
13.2%
n901604
12.3%
y888076
12.2%
e602120
8.2%
l234719
 
3.2%
t165749
 
2.3%
m132682
 
1.8%
Other values (5)228708
 
3.1%
Uppercase Letter
ValueCountFrequency (%)
M888076
75.3%
P82498
 
7.0%
F69723
 
5.9%
T62959
 
5.3%
S62959
 
5.3%
I6764
 
0.6%
C6764
 
0.6%
Space Separator
ValueCountFrequency (%)
22727096
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common22727096
72.8%
Latin8481814
 
27.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
r1083717
12.8%
s1059836
12.5%
a1040297
12.3%
o964563
11.4%
n901604
10.6%
M888076
10.5%
y888076
10.5%
e602120
7.1%
l234719
 
2.8%
t165749
 
2.0%
Other values (12)653057
7.7%
Common
ValueCountFrequency (%)
22727096
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII31208910
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
22727096
72.8%
r1083717
 
3.5%
s1059836
 
3.4%
a1040297
 
3.3%
o964563
 
3.1%
n901604
 
2.9%
M888076
 
2.8%
y888076
 
2.8%
e602120
 
1.9%
l234719
 
0.8%
Other values (13)818806
 
2.6%
Distinct5
Distinct (%)< 0.1%
Missing3613
Missing (%)0.3%
Memory size59.7 MiB
2.0
427184 
3.0
283384 
4.0
144520 
1.0
120592 
0.0
64617 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters3120891
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row2.0
3rd row3.0
4th row2.0
5th row2.0

Common Values

ValueCountFrequency (%)
2.0427184
40.9%
3.0283384
27.1%
4.0144520
 
13.8%
1.0120592
 
11.6%
0.064617
 
6.2%
(Missing)3613
 
0.3%

Length

2022-07-20T23:29:09.090773image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-20T23:29:09.132645image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
2.0427184
41.1%
3.0283384
27.2%
4.0144520
 
13.9%
1.0120592
 
11.6%
0.064617
 
6.2%

Most occurring characters

ValueCountFrequency (%)
01104914
35.4%
.1040297
33.3%
2427184
 
13.7%
3283384
 
9.1%
4144520
 
4.6%
1120592
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2080594
66.7%
Other Punctuation1040297
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01104914
53.1%
2427184
 
20.5%
3283384
 
13.6%
4144520
 
6.9%
1120592
 
5.8%
Other Punctuation
ValueCountFrequency (%)
.1040297
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3120891
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01104914
35.4%
.1040297
33.3%
2427184
 
13.7%
3283384
 
9.1%
4144520
 
4.6%
1120592
 
3.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII3120891
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01104914
35.4%
.1040297
33.3%
2427184
 
13.7%
3283384
 
9.1%
4144520
 
4.6%
1120592
 
3.9%

InsulationType
Categorical

HIGH CORRELATION
MISSING

Distinct3
Distinct (%)< 0.1%
Missing232787
Missing (%)22.3%
Memory size74.4 MiB
Factory Insulated
515548 
Loose Jacket
199203 
None
96372 

Length

Max length30
Median length30
Mean length30
Min length30

Characters and Unicode

Total characters24333690
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFactory Insulated
2nd rowFactory Insulated
3rd rowLoose Jacket
4th rowLoose Jacket
5th rowFactory Insulated

Common Values

ValueCountFrequency (%)
Factory Insulated 515548
49.4%
Loose Jacket 199203
 
19.1%
None 96372
 
9.2%
(Missing)232787
22.3%

Length

2022-07-20T23:29:09.174034image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-20T23:29:09.222234image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
factory515548
33.8%
insulated515548
33.8%
loose199203
 
13.1%
jacket199203
 
13.1%
none96372
 
6.3%

Most occurring characters

ValueCountFrequency (%)
13508201
55.5%
t1230299
 
5.1%
a1230299
 
5.1%
o1010326
 
4.2%
e1010326
 
4.2%
c714751
 
2.9%
s714751
 
2.9%
n611920
 
2.5%
u515548
 
2.1%
d515548
 
2.1%
Other values (9)3271721
 
13.4%

Most occurring categories

ValueCountFrequency (%)
Space Separator13508201
55.5%
Lowercase Letter9299615
38.2%
Uppercase Letter1525874
 
6.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t1230299
13.2%
a1230299
13.2%
o1010326
10.9%
e1010326
10.9%
c714751
7.7%
s714751
7.7%
n611920
6.6%
u515548
5.5%
d515548
5.5%
l515548
5.5%
Other values (3)1230299
13.2%
Uppercase Letter
ValueCountFrequency (%)
F515548
33.8%
I515548
33.8%
L199203
 
13.1%
J199203
 
13.1%
N96372
 
6.3%
Space Separator
ValueCountFrequency (%)
13508201
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common13508201
55.5%
Latin10825489
44.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
t1230299
11.4%
a1230299
11.4%
o1010326
 
9.3%
e1010326
 
9.3%
c714751
 
6.6%
s714751
 
6.6%
n611920
 
5.7%
u515548
 
4.8%
d515548
 
4.8%
l515548
 
4.8%
Other values (8)2756173
25.5%
Common
ValueCountFrequency (%)
13508201
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII24333690
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
13508201
55.5%
t1230299
 
5.1%
a1230299
 
5.1%
o1010326
 
4.2%
e1010326
 
4.2%
c714751
 
2.9%
s714751
 
2.9%
n611920
 
2.5%
u515548
 
2.1%
d515548
 
2.1%
Other values (9)3271721
 
13.4%

InsulationThickness
Real number (ℝ≥0)

MISSING
ZEROS

Distinct183
Distinct (%)< 0.1%
Missing232787
Missing (%)22.3%
Infinite0
Infinite (%)0.0%
Mean31.62129622
Minimum0
Maximum1872
Zeros128896
Zeros (%)12.3%
Negative0
Negative (%)0.0%
Memory size8.0 MiB
2022-07-20T23:29:09.269290image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q125
median30
Q340
95-th percentile75
Maximum1872
Range1872
Interquartile range (IQR)15

Descriptive statistics

Standard deviation23.80880348
Coefficient of variation (CV)0.7529357212
Kurtosis1384.949307
Mean31.62129622
Median Absolute Deviation (MAD)10
Skewness19.06388138
Sum25648760.65
Variance566.8591229
MonotonicityNot monotonic
2022-07-20T23:29:09.318920image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30151255
14.5%
0128896
12.3%
25120261
11.5%
50111547
10.7%
3594791
9.1%
4064568
 
6.2%
2047566
 
4.6%
8035009
 
3.4%
6014649
 
1.4%
157859
 
0.8%
Other values (173)34722
 
3.3%
(Missing)232787
22.3%
ValueCountFrequency (%)
0128896
12.3%
1225
 
< 0.1%
1.7521
 
< 0.1%
1.791
 
< 0.1%
1.892
 
< 0.1%
1.915
 
< 0.1%
1.921
 
< 0.1%
267
 
< 0.1%
2.331
 
< 0.1%
2.352
 
< 0.1%
ValueCountFrequency (%)
187231
< 0.1%
8901
 
< 0.1%
8701
 
< 0.1%
8011
 
< 0.1%
8001
 
< 0.1%
6702
 
< 0.1%
6601
 
< 0.1%
6004
 
< 0.1%
5803
 
< 0.1%
5601
 
< 0.1%

TotalDeliveredEnergy
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING
SKEWED

Distinct431412
Distinct (%)96.8%
Missing598110
Missing (%)57.3%
Infinite0
Infinite (%)0.0%
Mean24085.64042
Minimum-3929.793
Maximum5431169.676
Zeros0
Zeros (%)0.0%
Negative3
Negative (%)< 0.1%
Memory size8.0 MiB
2022-07-20T23:29:09.378424image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum-3929.793
5-th percentile7979.789
Q114844.7955
median21027.0305
Q329315.55175
95-th percentile48890.06385
Maximum5431169.676
Range5435099.469
Interquartile range (IQR)14470.75625

Descriptive statistics

Standard deviation23306.37571
Coefficient of variation (CV)0.9676460873
Kurtosis14625.08005
Mean24085.64042
Median Absolute Deviation (MAD)6989.88
Skewness87.22682973
Sum1.07373785 × 1010
Variance543187148.8
MonotonicityNot monotonic
2022-07-20T23:29:09.431671image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11864.49437
 
< 0.1%
6775.8336
 
< 0.1%
11281.87635
 
< 0.1%
9995.00829
 
< 0.1%
184.69727
 
< 0.1%
641.71725
 
< 0.1%
105.59225
 
< 0.1%
9139.24625
 
< 0.1%
477.23224
 
< 0.1%
13345.74424
 
< 0.1%
Other values (431402)445513
42.7%
(Missing)598110
57.3%
ValueCountFrequency (%)
-3929.7931
< 0.1%
-2843.4781
< 0.1%
-1805.0471
< 0.1%
50.5631
< 0.1%
56.1811
< 0.1%
69.8531
< 0.1%
72.8771
< 0.1%
73.3731
< 0.1%
74.9981
< 0.1%
77.3441
< 0.1%
ValueCountFrequency (%)
5431169.6761
< 0.1%
4129347.1041
< 0.1%
3846582.6461
< 0.1%
3444097.2061
< 0.1%
3343274.7431
< 0.1%
3133044.0421
< 0.1%
2868451.1021
< 0.1%
2844373.551
< 0.1%
2370104.1931
< 0.1%
2050664.8451
< 0.1%

DateOfAssessment
Categorical

HIGH CARDINALITY

Distinct5282
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size75.7 MiB
Oct 31 2019 12:00AM
 
662
Jun 28 2022 12:00AM
 
628
Jun 27 2022 12:00AM
 
601
May 26 2022 12:00AM
 
589
Apr 28 2011 12:00AM
 
583
Other values (5277)
1040847 

Length

Max length19
Median length19
Mean length19
Min length19

Characters and Unicode

Total characters19834290
Distinct characters34
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique90 ?
Unique (%)< 0.1%

Sample

1st rowOct 8 2010 12:00AM
2nd rowOct 19 2010 12:00AM
3rd rowOct 9 2010 12:00AM
4th rowOct 20 2010 12:00AM
5th rowOct 21 2010 12:00AM

Common Values

ValueCountFrequency (%)
Oct 31 2019 12:00AM662
 
0.1%
Jun 28 2022 12:00AM628
 
0.1%
Jun 27 2022 12:00AM601
 
0.1%
May 26 2022 12:00AM589
 
0.1%
Apr 28 2011 12:00AM583
 
0.1%
May 23 2022 12:00AM577
 
0.1%
Mar 4 2015 12:00AM566
 
0.1%
Jul 4 2022 12:00AM560
 
0.1%
May 16 2022 12:00AM559
 
0.1%
Oct 21 2021 12:00AM555
 
0.1%
Other values (5272)1038030
99.4%

Length

2022-07-20T23:29:09.477533image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
12:00am1043910
25.0%
nov97974
 
2.3%
oct96504
 
2.3%
may92206
 
2.2%
jun91621
 
2.2%
mar91208
 
2.2%
sep90677
 
2.2%
201488313
 
2.1%
201987904
 
2.1%
jul87543
 
2.1%
Other values (57)2307780
55.3%

Most occurring characters

ValueCountFrequency (%)
3434822
17.3%
03426234
17.3%
22861040
14.4%
12448191
12.3%
M1227324
 
6.2%
A1207195
 
6.1%
:1043910
 
5.3%
a259752
 
1.3%
u259367
 
1.3%
J255502
 
1.3%
Other values (24)3410953
17.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number10136008
51.1%
Space Separator3434822
 
17.3%
Uppercase Letter3131730
 
15.8%
Lowercase Letter2087820
 
10.5%
Other Punctuation1043910
 
5.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a259752
12.4%
u259367
12.4%
e247231
11.8%
r174290
8.3%
p173759
8.3%
n167959
8.0%
c165854
7.9%
v97974
 
4.7%
o97974
 
4.7%
t96504
 
4.6%
Other values (4)347156
16.6%
Decimal Number
ValueCountFrequency (%)
03426234
33.8%
22861040
28.2%
12448191
24.2%
9245613
 
2.4%
3231310
 
2.3%
4192489
 
1.9%
8187702
 
1.9%
7182712
 
1.8%
5181949
 
1.8%
6178768
 
1.8%
Uppercase Letter
ValueCountFrequency (%)
M1227324
39.2%
A1207195
38.5%
J255502
 
8.2%
N97974
 
3.1%
O96504
 
3.1%
S90677
 
2.9%
F87204
 
2.8%
D69350
 
2.2%
Space Separator
ValueCountFrequency (%)
3434822
100.0%
Other Punctuation
ValueCountFrequency (%)
:1043910
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common14614740
73.7%
Latin5219550
 
26.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
M1227324
23.5%
A1207195
23.1%
a259752
 
5.0%
u259367
 
5.0%
J255502
 
4.9%
e247231
 
4.7%
r174290
 
3.3%
p173759
 
3.3%
n167959
 
3.2%
c165854
 
3.2%
Other values (12)1081317
20.7%
Common
ValueCountFrequency (%)
3434822
23.5%
03426234
23.4%
22861040
19.6%
12448191
16.8%
:1043910
 
7.1%
9245613
 
1.7%
3231310
 
1.6%
4192489
 
1.3%
8187702
 
1.3%
7182712
 
1.3%
Other values (2)360717
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII19834290
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3434822
17.3%
03426234
17.3%
22861040
14.4%
12448191
12.3%
M1227324
 
6.2%
A1207195
 
6.1%
:1043910
 
5.3%
a259752
 
1.3%
u259367
 
1.3%
J255502
 
1.3%
Other values (24)3410953
17.2%

Interactions

2022-07-20T23:29:00.599953image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:56.172557image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:57.429256image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:58.247460image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:59.098834image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:59.882418image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:29:00.765201image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:56.489734image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:57.572137image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:58.401625image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:59.243086image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:29:00.009368image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:29:00.853180image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:56.659180image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:57.727586image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:58.549360image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:59.391280image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:29:00.145606image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:29:00.944246image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:56.885562image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:57.871819image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:58.705539image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:59.531157image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:29:00.272891image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:29:01.038239image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:57.049162image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:57.998788image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:58.850888image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:59.656750image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:29:00.412713image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:29:01.132039image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:57.158186image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:58.095098image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:58.957962image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:28:59.750823image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-20T23:29:00.507358image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-07-20T23:29:09.513911image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-07-20T23:29:09.653478image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-07-20T23:29:09.719972image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-07-20T23:29:09.787720image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-07-20T23:29:09.860816image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-07-20T23:29:02.623715image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-07-20T23:29:03.910267image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-07-20T23:29:06.658443image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-07-20T23:29:07.489344image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

CountyNameDwellingTypeDescrYearofConstructionEnergyRatingBerRatingGroundFloorArea(sq m)CO2RatingMainSpaceHeatingFuelMainWaterHeatingFuelVentilationMethodStructureTypeNoOfSidesShelteredInsulationTypeInsulationThicknessTotalDeliveredEnergyDateOfAssessment
0Co. DonegalDetached house1997C2180.01171.1945.53Heating OilHeating OilNatural vent.Masonry1.00Factory Insulated20.0025474.52Oct 8 2010 12:00AM
1Co. KildareDetached house2010B3137.56242.9335.66Heating OilHeating OilNatural vent.Masonry2.00Factory Insulated50.0027654.47Oct 19 2010 12:00AM
2Co. DublinSemi-detached house1999C3223.6199.3844.65Mains GasMains GasNatural vent.Masonry3.00Loose Jacket20.0017000.04Oct 9 2010 12:00AM
3Dublin 11Semi-detached house1965C2196.99138.4137.83Mains GasMains GasNatural vent.Masonry2.00NaNNaN22708.48Oct 20 2010 12:00AM
4Dublin 22Semi-detached house1985D2260.52127.1655.07Mains GasMains GasNatural vent.Masonry2.00Loose Jacket100.0028182.86Oct 21 2010 12:00AM
5Co. DonegalHouse1975D1248.0088.5762.68Heating OilHeating OilNatural vent.Masonry2.00Factory Insulated0.0018470.03Oct 16 2010 12:00AM
6Dublin 22Semi-detached house1985D2275.9773.5458.21Mains GasMains GasNatural vent.Masonry2.00Loose Jacket100.0017227.86Oct 21 2010 12:00AM
7Limerick CitySemi-detached house1960D1244.7189.5459.79Heating OilHeating OilNatural vent.Masonry3.00Loose Jacket80.0016711.95Oct 21 2010 12:00AM
8Co. KerryHouse1973D2293.82157.6271.30Heating OilHeating OilNatural vent.Masonry1.00Loose Jacket50.0040212.93Oct 21 2010 12:00AM
9Co. KilkennyDetached house1980D2299.9691.4475.47Heating OilHeating OilNatural vent.Masonry1.00Loose Jacket20.0021839.38Oct 20 2010 12:00AM

Last rows

CountyNameDwellingTypeDescrYearofConstructionEnergyRatingBerRatingGroundFloorArea(sq m)CO2RatingMainSpaceHeatingFuelMainWaterHeatingFuelVentilationMethodStructureTypeNoOfSidesShelteredInsulationTypeInsulationThicknessTotalDeliveredEnergyDateOfAssessment
1043900Dublin 7Top-floor apartment2022A246.9077.008.82NaNNaNBal.whole mech.vent heat recvrMasonry2.00NaNNaNNaNJun 28 2022 12:00AM
1043901Co. DublinGround-floor apartment2004D2281.3473.5055.32ElectricityElectricityNatural vent.Masonry3.00NaNNaNNaNJul 1 2022 12:00AM
1043902Dublin 18Mid-floor apartment2020A355.2337.8010.86ElectricityElectricityNaNNaNNaNNaNNaNNaNJul 1 2022 12:00AM
1043903Dublin 18Mid-floor apartment2020A246.3250.899.11ElectricityElectricityNaNNaNNaNNaNNaNNaNJul 1 2022 12:00AM
1043904Dublin 18Mid-floor apartment2020A238.6086.587.59ElectricityElectricityNaNNaNNaNNaNNaNNaNJul 4 2022 12:00AM
1043905Co. DonegalDetached house1982D2282.58214.1870.89Heating OilHeating OilNatural vent.Masonry1.00NaNNaN52927.53Jun 25 2022 12:00AM
1043906Dublin 6Mid-terrace house1900G998.1499.77317.99Manufactured Smokeless FuelElectricityNatural vent.Masonry4.00NaNNaNNaNJul 6 2022 12:00AM
1043907Dublin 1Mid-floor apartment2021A237.2681.327.33NaNNaNWhole house extract vent.Masonry2.00NaNNaNNaNDec 20 2021 12:00AM
1043908Dublin 1Mid-floor apartment2022A236.0582.096.76NaNNaNBal.whole mech.vent heat recvrMasonry4.00NaNNaNNaNJul 14 2022 12:00AM
1043909Co. MonaghanDetached house2013B190.57334.3519.66ElectricityElectricityBal.whole mech.vent heat recvrTimber or Steel Frame1.00NaNNaNNaNJul 15 2022 12:00AM

Duplicate rows

Most frequently occurring

CountyNameDwellingTypeDescrYearofConstructionEnergyRatingBerRatingGroundFloorArea(sq m)CO2RatingMainSpaceHeatingFuelMainWaterHeatingFuelVentilationMethodStructureTypeNoOfSidesShelteredInsulationTypeInsulationThicknessTotalDeliveredEnergyDateOfAssessment# duplicates
1492Co. GalwayMaisonette2000D2299.1183.6071.23ElectricityElectricityNatural vent.Masonry3.00Factory Insulated40.0011864.49Jun 9 2009 12:00AM37
675Co. CorkSemi-detached house1995G476.8029.91113.55ElectricityElectricityNatural vent.Masonry2.00Loose Jacket25.006775.83Apr 13 2011 12:00AM36
1487Co. GalwayMaisonette2000D2272.4187.4064.87ElectricityElectricityNatural vent.Masonry3.00Factory Insulated40.0011281.88Jun 9 2009 12:00AM35
3693Co. WestmeathMid-terrace house2008B3144.1285.7337.03Heating OilHeating OilNatural vent.Masonry2.00Factory Insulated60.009995.01Jun 2 2010 12:00AM28
3337Co. SligoApartment2004D1232.3463.9055.33ElectricityElectricityNatural vent.Masonry3.00Factory Insulated40.007051.80Dec 20 2009 12:00AM18
800Co. CorkTop-floor apartment2006C3212.5251.2641.01Mains GasMains GasNatural vent.Masonry3.00Factory Insulated25.008974.92May 5 2009 12:00AM17
1619Co. GalwayTop-floor apartment2002C3223.53103.5053.23ElectricityElectricityNatural vent.Masonry2.00Factory Insulated40.0010913.94Sep 28 2009 12:00AM17
270Co. CorkApartment2006C1154.50120.1930.06Mains GasMains GasNatural vent.Masonry3.00Factory Insulated25.0015047.25May 5 2009 12:00AM16
1494Co. GalwayMaisonette2002D2263.95109.5062.86ElectricityElectricityNatural vent.Masonry3.00Factory Insulated40.0013637.69Sep 28 2009 12:00AM16
4281Dublin 11Mid-terrace house2000C1166.9981.3234.96Mains GasMains GasNatural vent.Masonry3.00Factory Insulated30.0010655.74Feb 20 2019 12:00AM16